IMS Collections 

Pushing the Limits of Contemporary Statistics: Contributions in Honor of 
Jayanta K. Ghosh 
Vol. 3 (2008) 200-222 

© Institute of Mathematical Statistics, 2008 
DOI: 10.1214/074921708000000156 



Reproducing kernel Hilbert spaces of 
Gaussian priors 

A. W. van der Vaart^ and J. H. van Zanten*^ 

Vrije Universiteit Amsterdam 

Abstract: We review definitions and properties of reproducing kernel Hilbert 
spaces attached to Gaussian variables and processes, with a view to applica- 
tions in nonparametric Bayesian statistics using Gaussian priors. The rate 
of contraction of posterior distributions based on Gaussian priors can be de- 
scribed through a concentration function that is expressed in the reproducing 
Hilbert space. Absolute continuity of Gaussian measures and concentration 
inequalities play an important role in understanding and deriving this result. 
Scries expansions of Gaussian variables and transformations of their reproduc- 
ing kernel Hilbert spaces under linear maps are useful tools to compute the 
concentration function. 
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1. Introduction 



Ghosal, Ghosh and van der Vaart considered in [4] the rate of contraction of a 
posterior distribution based on i.i.d. observations to the true density. Given prior 
probability measures H„ defined on a set V of densities p relative to a given a- 
finite measure on a measurable space (such that the maps {x,p) > p{x) are jointly 
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measurable) and observations Xi, . . . , X„, they characterized the rate £„ | at 
which the posterior distribution 



(1.1) n„(s|Xi,...,x„) = 



contracts to po if the observations are an i.i.d. sample from this density, i.e. the 
rate for which 

Ep„n„ {p:d{p,pn) > Me,,] Xi, . . . , X„) ^0, 

for sufficiently large M. In their results d can be the Hellinger distance, the Li- 
distance, or the _L2-distance if the densities are uniformly bounded above. 

The paper [\f>\ applied these results to priors n„ constructed from Gaussian 
processes. They consider a prior n„ constructed as the distribution of pw, for W 
a Gaussian random element in a Banach space (B, || ■ ||) and w ^ Pw & map such 
that, for some constant C and all v,w G B with \\v — w\\ bounded above by some 
fixed constant, 

d{pv,Pw) < C\\v-w\\, 
K{py,pj^) < C\\v-w\\'^, 
V{pv,Pw) < C\\v-w\\'^. 

Here K{p,q) = J log{p / q) p dfi is the KuUback-Leibler divergence and V{p,q) = 

J (log{p/q)^^ pd^. This setting covers, for instance, the case of density estimation 
on [0, 1] as considered in Tokdar and Ghosh [14], with d the Hellinger distance, the 
Banach space equal to B = C[0, 1] and 



Pwix) = -i -. 

e^y dy 

It also covers logistic or probit regression as considered in [5] with appropriate 
choices and several other situations, as shown in [15]. 

In the latter paper it is shown that if the true density takes the form po = Pioo i 
then the rate of posterior contraction e„ is characterized by the pair of equations 



(1.2) inf \\h\\l < ne 

hm:\\h-wo\\<e„ 



(1.3) -logP(||iy|l <£„) < 



nel. 



Here (H, || • ||h) is the reproducing kernel Hilbert space (RKHS) of the Gaussian 
variable, and P(||M^|| < e) is its small ball probability (cf. [11]). Both equations 
have a minimal solution e„, and the rate is the worse of the two solutions. The 
second depends only on the prior, and gives a maximal rate regardless of the true 
parameter wq, whereas the first involves the true parameter. 

The reproducing kernel Hilbert space arises because it determines the support 
and the "geometry" of the concentration of the Gaussian measure, which are crucial 
for its success as a prior. Results on RKHSs of Gaussian variables are spread over 
many research papers, and sometimes seem to belong to what is "well known" with- 
out clear references. Moreover, there are different definitions for stochastic processes 
and Borel measurable maps in a separable Banach space. In this paper we review 
definitions, investigate when the different definitions agree, and derive results that 
are useful for the construction of priors and the study of posterior distributions. 
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2. Definitions and elementary properties 

In this section we give and compare two definitions of RKHS, one for stocliastic 
processes and one for Borel measurable maps in a Banach space. 

2.1. Gaussian processes 

A zero-mean Gaussian stocliastic process W = {Wt: t G T) is a set of random vari- 
ables Wt indexed by an arbitrary set T and defined on a common probability space 
(f2,Z^,P) such that each finite subset possesses a zero-mean multivariate normal 
distribution. The finite-dimensional distributions of such a process are determined 
by the covariance function K:T x T ^ TZ, defined by 

Kis,t) = EW.sWt. 

The reproducing kernel Hilbert space (RKHS) attached to the Gaussian process W 
is the completion H of the linear space of all functions 

k 

(2.1) t i-^^aiK{si,t), ai,. .. ,ak e 11,31,. .. ,Sk e T,k eN, 

1=1 

relative to the norm induced by the inner product 

k I k I 

(2.2) (Y,^^K{s., •), E ■))™ = E E tj)- 

\ / M 

i=l = 1 i=l j = l 

It can be checked that this definition is independent of the representation of the 
functions on the left, and that this defines a valid inner product. 

The completion of the collection of functions (2.1) is an abstract metric-topolo- 
gical operation using the metric induced by the inner product (2.2) only. As such 
the completion is not a space of functions f:T —f TZ. However, it can be identified 
with a space of functions f-.T^TZ, through the reproducing formula 

f{t) = {f,K{t,-))M. 

For / a linear combination of the form J2i=i'^i-^i^ii ') this formula follows from the 
definition (2.2) of the inner product (•, For general / £ H the (extended) inner 
product on the right (with the extended function K(t,-)) is well defined through 
the completion operation, and can be used to define a function f:Ti—fTZ. 
Alternatively, the function in (2.1) can be written as 

(2.3) t ^ EWtH, H = Y, ■ 

i 

With the function in the display written as F,W.H, the inner product (2.2) is equal 
to 

{EW.Hi,EW.H2)^ = EH1H2. 

Thus the map H EW.H is an isometry for the norm of the L2-space attached to 
the probability space {Q,hl,P) on which the process W is defined and the RKHS- 
norm. The stochastic process RKHS H, which is defined as the completion of the 
set of functions (2.3), is therefore precisely the set of functions t 1-^ EWtH with 
H ranging over the closure of the set of linear combinations H = J^i OiiWsi in 
L2{^,U ,V) (known as the first order chaos of W). It follows again that we can 
view H as a Hilbert space of functions on T. 
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2.2. Gaussian elements in a Banach space 

A Borcl measurable random element W with values in a separable Banach space 
(B, II ■ II) is called Gaussian if the random variable b*W is normally distributed for 
any element h* of the dual space B* of B, and it is called zero-mean if the mean 
of every such variable h*W is zero. Henceforth we shall only consider zero-mean 
Gaussian variables. 

It is well known that the norm ||M^|| of a zero-mean Gaussian variable, which is 
a finite random variable by the assumption that W takes its values in B, has sub- 
Gaussian tails, (cf. Corollary 5.1 below, or, e.g., [17], Propositions A. 2.1 and A. 2. 3, 
for a direct proof.) In particular, all moments E||I4^p are finite. We set 

a'^{W) = sup m*{Wf. 

&*6B*:||6'|| = 1 

This is a finite number, bounded by E||I1^|p. 

For every element b* e B* we define Sh* e B as the Pettis integral YjWb*{W) 
of the B- valued random element Wh*{W). By definition, this Pettis integral is an 
element Sb* of B such that bl{Sb*) = mi{W)b*{W) for every &2 e B*. The following 
lemma allows us to derive the existence of the Pettis integral from the fact that 
E||W^||2 < oo. 

Lemma 2.1. If X is a Borel measurable map in a separable Banach space B with 
E||X|| < oo, then there exists an element 5 € B such that b*{b) = E5*(X) for every 
b* € B*. 

Proof. Because the Banach space is assumed separable, the map X is automatically 
tight (e.g. [17], 1.3.2). Therefore, for any n € N there exists a compact set K such 
that E|| Ar|| Ix^if < 1/n. This compact set can be partitioned into finitely many 
sets Bi of diameter smaller than 1/n. Without loss of generality these partitions 
can be chosen as successive refinements for increasing n. Let Xn ~ J2i^i^xeBi 
for bi arbitrary points in the partitioning sets. Then EAr„:= J^i^i^i-^ € Bi) 
satisfies b*{EXn) = E5*(X„) for every b* e B*. Furthermore, the sequence EX„ is 
a Cauchy sequence in B, because ||EX„ — EAr,„|| = sup||(,.||^;^ |E6*(X„ — Xm)\ < 
E||X„ — Xm\\ as n,m ^ oo. Because E||X„ — X\\ < 2/n, we have that 
&*(EA:„) === E6*(X„) Eb*{X) for every b* G B. The strong limit b of the sequence 
EXn is of course also a weak limit, whence b*{b) = Eb*{X) for every 6* e B*. □ 

The reproducing kernel Hilbert space (RKHS) H attached to W is the completion 
of the range SM* of the map S": B* ^ B defined by Sb* = EWb*{W) for the inner 
product 

{Sbl,Sb*)M = Ebl{W)b*2{W). 
By the Hahn-Banach theorem and the Cauchy-^Schwarz inequality, 

||S'&*|| = sup \b*2iSb*)\ = sup \Eb;{W)b*{W)\ 

b*eK':\\b*\\ = l 6*GB':|[b'j| = l 

(2.4) < a{W){Eb*{WfY^^ = a{W)\\Sb*\\M. 

It follows that the RKHS-norm on the set SM* is stronger than the original norm, 
so that a || • Hn-Cauchy sequence in SM* C B is a || • ||-Cauchy sequence in B. 
Gonsequently, the RKHS, which is by definition the completion of the set SM* 
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under the RKHS norm, can be identified with a subset of B. In terms of the unit 
balls Bi and Mi of B and H the preceding display can be written as 

(2.5) Hi C a{W)Mi. 

In other words, the norm of the embedding i: H ^ B is bounded by (t{W). 
Lemma 2.2. The map S:M* ^ M is weak-* continuous. 

Proof. The unit ball B* of the dual space is weak-* metrizable ([12], 3.16). Therefore 
the restricted map 5: B| — H is weak-* continuous if and only if weak-* convergence 
of a sequence &* in B* to an element b* implies that 56* Sb* in H. Now the weak-* 
convergence 5* — * b* is by definition pointwise convergence on B. Then the sequence 
(&* — b*){W) tends to zero (almost) surely, and hence also in distribution. Because 
each of these variables is zero-mean Gaussian, this implies that the variances tend 
to zero, i.e. ||S'&,* — = E(6* — b)'^(W) — > 0. (Alternatively, use the uniform 

integrability of the variables b*W instead of the Gaussianity.) 

This concludes the proof that the restriction of S to the unit ball BJ is continuous. 
A weak-* converging net 6* in B* is necessarily bounded in norm, by the Banach- 
Steinhaus theorem ([12]. 2.5), and hence is contained in a multiple of the unit ball. 
The continuity of the restriction then shows that 5*6* — > Sb* , which concludes the 
proof. □ 

Corollary 2.1. //Bq is a weak-* dense subset o/B*, then H is the completion of 

By the definitions {Sb*,Sb*)M = Eb*Wb*W = b*{Sb*), for any b*,b* G B*. By 
continuity of the inner product this extends to the reproducing formula: 

(2.6) {Sb*,h)M = b*{h), 

which is valid for every h gM and 6* g B*. 

Just as for stochastic processes there is an alternative representation of the RKHS 
through "first chaos" , in the present setting defined as the closed linear span of the 
variables b*W in L2{fl,U, P). The elements Sb* of the RKHS can be written Sb* = 
EHW for H ^ b*W, and the RKHS-norm of Sb* is by definition the L2{n,U,F)- 
norm of this H. This immediately implies the following lemma. Note that EHW is 
well defined as a Pettis integral for every H G L2{^,U, P), by Lemma 2.1. 

Lemma 2.3. The RKHS is the set of Pettis integrals FiHW for H ranging over the 
closed linear span of the variables b*W in L2{^,U,P) with inner product {EHiW, 

H2W)m = EH1H2. 

It is useful to decompose the map 5": B* ^ B as 5 = A* A for A*:L2{n,U, P) ^ B 
and A:M* L2in,U,P) given by 

A*H = EHW, 
Ab* = b*W. 

It may be checked that the operators A and A* are indeed adjoints, after identifying 
B with a subset of its second dual space B* under the canonical embedding ([ ' '], 
3.15, 4.5), as the notation suggests. By the preceding lemma the RKHS is the 
image of the first chaos space under A*. Because R{A)^ = N{A*) the full range 
R{A*) = A* (^L2{^,U ,P)) is not bigger than the image of the first chaos, although 
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the map A*: L2{^,U,P) ^ H is an isometry only if restricted to the first chaos 
space. 

Recall that an operator is compact if it maps bounded sets into precompact sets, 
or, equivalently, maps bounded sequences into sequences that possess a converging 
subsequence. 

Lemma 2.4. The maps A*: L2iflM,F) B and A:M* L2(f^,W,P) andS:B* 
B are compact for the norms. 

Proof. In general an operator is compact if and only if its adjoint is compact, 
and a composition with a compact operator is compact (see ([12], 4.19). To prove 
the compactness of A fix some sequence 6* in the unit ball B^. As the unit ball 
is weak-* compact by the Banach-Alaoglu theorem ([12], 4.3(c)), there exists a 
subsequence along which 6* . converges pointwise on B to a limit b* . Consequently 
b^.{W) — > b*{W) almost surely, and hence in second mean. □ 

As a consequence we can conclude that the unit ball of the RKHS is precompact 
in B. Indeed, Hi = A*lJi for Ui the unit ball of L2{i^,U, P), and hence is precompact 
by the compactness of A* . 

Example 2.1 (Hilbert space). The covariance operator of a mean zero Gaussian 
random element W in a Hilbert space B with inner product (•, •) is the map S": B ^ B 
that satisfies E{W,bi){W,b2) = (6i,S'&2)- It is well known that S is continuous, 
linear, positive, self-adjoint, and of finite trace, and hence it possesses a square root, 
which is another positive, self-adjoint operator S^^^:M B such that S^^'^S^^^ = S. 
(The square root can also be described as having the same eigenfunctions as S with 
eigenvalues the square roots of the eigenvalues of S.) The RKHS of W can be 
characterized as the range of S^^^ equipped with the norm j|5"'^/^6||H = \\b\\. 

To see this note that the covariance operator S is exactly the operator S as 
defined previously, after the usual identification of the dual space B* with B itself: 
6 € B corresponds to the element bi i— > (6, 6i) of B*. Hence the RKHS is the 
completion of the elements Sb under the square norm || 5*611^ ~ E(W^, = {b, Sb) = 
This is the same as the completion of the set of functions S-^^^c (with 
c = S^/'^b) under the norm ||S'^/^c||y = ||c||^. The latter set is of course already 
complete, so that completion is superfluous. 

2. 3. Comparison 

If the sample paths t ^ Wt oi a, stochastic process W = [Wt'.t £ T) belong to 
a Banach space of functions, then the process can be viewed as a map W into 
the Banach space. If it is a Borel measurable map, then the preceding gives two 
definitions of a RKHS. The two definitions will coincide provided the dual space 
can be appropriately related to the covariance function. In particular, if the coor- 
dinate projections 7rt:B ^ TZ, defined by 6 h^- b{t), are elements of the dual space, 
then Wt ~ T^t{W) and the covariance function K{s,t) — EWsWt takes the form 
Ens{W)TTt{W) = (StTs, STTt)M- If the other elements Sb* are determined by the ele- 
ments SiTt, then the two definitions should be the same. It appears that in general 
some conditions are needed to make the link between the two definitions. For the 
Banach space 1°°{T) of uniformly bounded functions z:T — > 7?, equipped with the 
uniform norm ||z|| = sup{|z(i)|:< S T}, this can always be done. 

The following result is probably known to the experts, but we do not know a 
published reference. 
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Theorem 2.1. If W is a Borel measurable zero-mean Gaussian random element 
in a complete separable subspace of £°°{T) equipped with the uniform norm, then 
the Banach space RKHS and the stochastic process RKHS coincide. Furthermore 
Snt^Kit,-). 

Proof. For a given tight Borel measurable random element W in £°° (T) there exists 
a semimetric p on T under which T is totally bounded and such that W takes 
its values in the subspace U C{T, p) of functions f:T^Tl that are uniformly 
continuous relative to p (e.g. [17], Lemma 1.5.9). Thus we may assume without loss 
of generality that W takes its values in U C{T, p) for such a semimetric p. The space 
UC(T,p) is a Banach space under the supremum norm ||/|| = sup{|/(i)|:t G T}. 
Let K{s,t) = EWsWt. 

The coordinate projections iTt'. f ^ f{t) belong to the dual space UC{T, p)* . The 
corresponding Pettis integral Sitt is the function K{t, ■). This follows because it is 
contained in UC(T, p) and, furthermore, for every s G T, 

ns{K{t,-)) - Kit,s) ^ EW.Wt = E7r,{W)7rtiW). 

Because the coordinate projections ttj/ identify / uniquely it follows that K{t, •) = 
EWntiW) = Snt. 

Thus the stochastic process RKHS, defined as the completion of the linear com- 
binations (2.1), is contained in the Banach space RKHS. The inner products on the 
two spaces agree, because 

{StTs, SntU = E7rtiW)7r,iW) = K{s, t) = {K{s, ■),K{t, ■))^. 

By the Riesz representation theorem an arbitary element of UC{T,p)* is a map 
f ^ J f{t) dp{t) for a signed Borel measure on the completion T of T and f:T-^TZ 
is the continuous extension of /. Because T is totally bounded we can write it for 
each m G N as a finite union of sets of diameter smaller than 1/m. If we define firn 
as the measure obtained by concentrating the masses of fi on the partitioning sets 
in a fixed, single point in the partitioning set, then J f dpm ~* j f dp, as m oo, 
for each / G UC (T, p) . The map f J f dp„i is a linear combination of coordinate 
projections. It follows that for any b* G UC{T,p)* there exists a sequence 6*„ of 
linear combinations of coordinate projections that converges pointwise on UC{T, p) 
to 6*. In other words, the linear span Bq of coordinate projections is weak-* dense 
in UC{T, p)*, and hence the RKHS is the completion of SMq, by Lemma 2.1. □ 

Example 2.2. The preceding theorem applies, for instance, to the space of con- 
tinuous fimctions z: T TZ on a compact metric space T. For instance C[0, 1]. 

A more general connection between the two definitions of a RKHS can be made 
by embedding the Banach space B in its second dual (see [12], 4.15). This is some- 
what technical and will not be needed in the rest of the paper. The canonical 
embedding is, as usual, the identification of 6 G B with the map 6**:B* TZ de- 
fined by b**{b*) = b*{b). A Borel measurable random element in B becomes 
identified in this way with the stochastic process W** = {b*{W):b* G B*), which 
has covariance function 

K{blb*)^Ebl{W)b;{W). 

The stochastic process RKHS H attached to this process in Section 2.1 is the 
completion of the set of hmctions K{b*, •):B* TZ relative to the inner product 



{K{bl, K{b;, •))h = K{bhb;) = Ebl{W)b;{W). 
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The function K{b*,-) is exacly the Pettis integral 'EWb*{W), written Sh* in the 
preceding and now viewed as an element of B**; and the inner product in the 
display is exactly (5*6^, 5*62)0- Thus the two definitions of RKHS coincide, after 
identification of B and its image in B** under the canonical embedding. 



3. Absolute continuity 

Given a zero- mean Gaussian process W = {Wt-t £ T) with covariance kernel K 
defined on a probability space {^^U^ P) with RKHS M as defined in Section 2.1, we 
can define a map U:W^ L2{i^,U, P) by defining 

(3.1) UKit,-)^Wt, 

and extending linearly and continuously. This map is an Hilbcrt space isometry, 
since 

E[/i^(s, -pKit, •) = EWsWt = K[s, t) = {K{s, ■),K{t, ■))^. 

This isometry property also implies the existence of the extension. It follows that 
the process (Uh:h € H) is the iso-Gaussian process indexed by H: a mean-zero 
Gaussian process with covariance function YJJgUh ~ {g, /i)h. 

The process W induces a distribution on the product cr-field of TZ'^ . For 
a function f-.Th-^TZ the process {Wt + f{t):t G T) induces another distribution 
pW+f same space. 

Lemma 3.1. // / e H, then P^+f and P^ are equivalent and 

^^iW)^e^f-h\f\\\ a.s. 

Proof. The process W is the "subprocess" = {Ug:g G G) of the iso-Gaussian 
process = {Uh: h gM) for G the set of functions K{t, •) with t ranging over T. 
From the general theory of Gaussian processes 

(3.2) -^-^^^ (W-)=e-f-.yf^, a.s. 

The process arises from the iso-Gaussian process by the projection ttq: TZ^ — > 
TZ'^ . The corresponding Radon-Nikodym derivative can be found as the conditional 
expectation 



Because lin (G) is dense in HI by construction and U is continuous, the variable U f is 
the L2{^,h{, P)-limit of a sequence Ugn with (gn) C lin (G) and hence is measurable 
relative to the completion of the cr-field generated by W"^. Consequently, the right 
side of (3.2) is W'^-measurable as well and hence the conditional expectation in the 
preceding display is unnecessary. 

Finally, note that the shift (5, f)^ is exactly the function / after the identification 
g ^ K{t, •), by the reproducing property: f{t) = {K{t, •), /)e for every t □ 

Let H be the abstract RKHS attached to a zero-mean, Borel measurable, 
Gaussian random element W in a separable Banach space B defined on a prob- 
ability space {^Mi P)- Let [/: H L2{^,U, P) be the isometry defined by 

(3.3) U{Sb*) = b*{W), 6*eB*, 
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and extending continuously. It is the same map U as in (3.1) if we make the iden- 
tification Stti = K{t,-) of Theorem 2.1; also US = A ioi A defined in Section 2.2. 
As before the map U is an isometry. The preceding lemma can be translated to the 
present situation. 

Lemma 3.2. IfheM then the distributions P^+'' and of W + h and W on 
M are equivalent and 

^^iW)^e^^-hMl^ 

Proof. The process W** = {b*{W): 6* G 1*) arising from W through the canonical 
embedding generates the same cr-field on the underlying probability space as W 
and can be viewed as a measurable transformation of W under the map B — s- 
TZ^ given by (f>{b){b*) = b*{b). The process W + h is transformed in the process 
W** + h** = (j)(W + h). The result therefore follows from Lemma 3.1. 

The following alternative proof is given in Proposition 2.1 in [-3]. The isometry 
property of U shows that E(?7/i)^ = W^W^- Because Uh is in the closed linear span 
of the zero-mean Gaussian variables USb* = b*W , it is itself zero- mean Gaussian. 
It follows that 

dQ = e^''-2ll''llBfiP 

defines a probability measure on {il,U). For any bl,b2 E B* the joint distribution 
of {USbl,USb2) = {blW,b2W) is bivariatc normal with mean zero and covariance 
matrix ((5*6*, S'&*)h) j taking hmits we see that for every /i G H the joint 

distribution of (blW, Uh) is bivariate normal with mean zero and covariance matrix 
E with Eia = \\Sbl\g, Ei,2 = {Sbt,h)m and ^2,2 = \\h\g. Thus 

The right side is also equal to 

The last step follows from the reproducing formula (2.6). Wc conclude that the 
distribution oi W + h under P is the same as the distribution of W under Q, i.e. 
F{W + h€ B) = EqIb(M^) = ElB(W)idQ/dP). □ 

The preceding lemma requires that the shift h is contained in the RKHS. If this 
is not the case, then there is no density. 

Lemma 3.3. //6 ^ H then the distributions P^-'+b pW ^fw + b and W on 
B are orthogonal. 

Proof. By Lemma 5.1 (below) the closure H of H in B is the support of W. Because 
the afRne spaces H and M + b are disjoint if 6 ^ H, the assertion is clear if & S B — H. 
Therefore, it is not a loss of generality to assume that B is the closure of H. 

Fix a sequence {6* } C B* whose linear span is dense (for the norm) in W and 
is such that the variables b'^W are i.i.d. standard normal variables. We prove the 
existence of such a sequence at the end of the proof. We claim that M = {b £ 
B: J2^=ii^n^)'^ < °°}- Indeed, the sequence hn = 5*6* is orthonormal in H by the 
definition of the inner product in H and lin(/i„) = 5'lin(&*) is dense in SM* by 
construction of the sequence 6* and continuity of 5*. By the reproducing formula 
b^h {h, hn)m for every h e H, whence X^nl^n'*)^ < °°- Conversely, if X]n(^n^)^ < 



RKHS of Gaussian priors 



209 



OO, then h: = J2n(^nb)^n ^ well-defined element of H, with b*^h = b^^b for every 
m because b*-^hn = {hn, hm)n = Smn- Because the linear span of the sequence (6* ) 
is dense in B* it follows that b*h = b*b for every b* and hence b ~ h, which is 
contained in H. 

The map 0:B ^ 7?.°° defined by 6 i-^ (^n^) is well defined and measurable. It 
maps W onto a sequence (Z„) = (/"(W^) of standard normal variables and maps 
W + b onto the sequence (Z„ + b'^b) of independent shifted normal variables. By 
Kakutani's dichotomy the latter two laws arc orthogonal if J^ni^n^)'^ — This 
implies the orthogonality of the laws of W and W ^b. 

Finally we prove the existence of (6*) as claimed. Starting with an arbitrary 
dense sequence (6*) in B*, we can make this linearly independent by removing 
from left to right in the sequence 6^ , 63 : ■ • ■ every &* that can be written as a 
linear combination of the preceding (left-over) b*. This procedure yields a linearly 
independent sequence (6*) whose span is dense in B*. The random variables h*^yV 
are automatically Hnearly independent in L2(ri,Z^,P), because ^n^nb%yV = 0, 
almost surely for a sequence A„ with finitely many nonzero elements. This implies 
that ^Ylin '^n^n 'L^'^ct On a set with probability one under the law of W ^ and hence 
by continuity also on the support of this law. which is B by assumption. Thus 
we can apply the Gramm-Schmidt procedure to turn the sequence into a 

sequence of standard normal variables (Z„). Then Z„ = ^Yi!i=i^i,nVlW for every n 
for a triangular array of coefficients (Ai_„) with A„_,i 7^ for every n. The sequence 
^"^^Ai,„&* has the desired properties. □ 



4. Series representation 

Suppose that the covariance kernel K of the Gaussian process W = iyVt-t € T), 
defined on the probability space (r2,W,P), can be written in the form 

00 

(4.1) K{s,t) = Y.^,4>M<l>,{t) 

for positive numbers Ai, A2, . . . and arbitrary functions (j)j:T s- TZ, where the series 
is assumed to converge pointwise on T x T. The convergence on the diagonal implies 
that J2j ^j'^'ji^) < 00 for all t £ T. Then by the Cauchy-Schwarz inequality the 
series X^jli '^j4>j{t) converges absolutely for every sequence {wj) of numbers with 
^i/'^J ^ ^^■^cry i, and hence defines a function from T to TZ. We assume 

that the functions are linearly independent in the sense that Wj<j)j{t) = for 
every f G T for some sequence {wj) with l^j < 00 implying that wj = for 

every j £ N. 

Theorem 4.1. // the covariance function K of the mean-zero Gaussian process 
W = (Wt'.t € T) can be represented as in (4-1) for numbers Xj and functions 
4>j'.T —> TZ which satisfy '^JLi ^j'^'ji^) < 00 foi^ every t G T and are linearly 
independent as indicated, then the RKHS of the stochastic process W is the set of 
all functions X^j^i '^j4'j{i) with X^j^i '^'j / ^'^'^ inner product is given 
by 

00 00 00 

(4.2) (E-^'^-E^.'^.) 
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Proof. Under the condition that J^kLi '^fc'^fe(0 < °° fo'^ every t £ T, the infinite 
sum defining K{s,t) converges for every (s,t) e T x T, by the Cauchy-Schwarz 
inequaUty, and hence the kernel is well defined. Let H be the set of all series 
^kLi fk4'k when (fk) ranges over the sequences with J2T=i fk/^k < oo. (These 
series were noted to converge pointwise absolutely before the statement of the the- 
orem.) By the assumed linear independence of the functions the coefficients 
(fj) are identifiable from the corresponding functions fj(f>j € H. Therefore we 
can define a bijection i: H ^ £2 by i: fk4>k ^ (fk/V^)- The set H becomes a 
Hilbert space under the inner product induced from £2 , which is given on the right 
side of (4.2), and which we denote by (•, ■)h- Wc must prove that this inner product 
agrees with the inner product of H and that H and H are the same as sets. 

The function A'(s, •) has a representation X^fc^i fk4'k for fk = Afe0fe(s), and hence 
is contained in H. It also follows that 

{Kis, ■),Kit, .))^ = f ^^^'^^f'^^'^^'^ ^ Kis, t) = {Kis, .), Kit, .))„, 
k=i '^^ 

where the second equality follows from the series representation of K, and the third 
is (2.2). Thus the inner products of H and H agree. We conclude that H contains 
H isometrically. 

The space H has the reproducing property: (/, K{t, ■)) h = f{t) for every t E T 
and f € H . This follows from 

(/, Kit, .)), = (E Mk, E A.^.)^ = E = m. 

li f E H with / _L H, then in particular / _L K{t, •) for every t E T and hence 
f{t) = by the reproducing formula. Thus i7 = H. □ 

Series expansions of the type (4.1) are not unique, and some may be more useful 
than others. They may arise as an eigenvalue expansion of the operator correspond- 
ing to the covariance function. However, this is not a requirement of the proposition, 
which applies to arbitrary functions (f)j. 

Example 4.1. Suppose that (T, 6, v) is a measurable space and 

j j K'^{s, t) dv{s) dv{t) < 00. 
Then the integral operator K: L2{T, 9, i') L-iiT, 6, v) defined by 

Kf{t)^ j f{s)K{s,t) dy{t) 

is compact and positive self-adjoint. Thus there exists a sequence of eigenvalues 
Afc j and an orthonormal system of eigenfunctions (j)k £ L2{T, 6, v) (thus K(f)k = 
Xk4>k for every k E N) such that (4.1) holds, where the series converges in L2{T x 
T,Q X 0,1^ X v). The series J2k fk4>k now converges in L2{T, 0, v) for any sequence 
{fk) in H.2- By the orthonormality of the fimctions (j)k, they are certainly linearly 
independent. 

If the series (4.1) also converges pointwise on T xT, then in particular K{t, t) = 
Sfe A/c</>fc(0 < 00 for alH e T and Theorem 4.1 shows that the RKHS is the set of 
all functions J2k fk4'k for sequences {fk) such that {fk/V^) G £2- 

If the kernel is suitably regular, then we can apply the preceding with many 
choices of measure v, leading to different eigenfunction expansions. 
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If the process itself can be expanded as a series 

oo 

for a sequence of i.i.d. standard normal variables (Zj) and suitable functions (/>j, 
where the series converges in 

L2{il,U,P), then (4.1) holds with Xj = fi'j and the 
stochastic process RKHS takes the form given by the preceding proposition. The 
following proposition gives a Banach space version of this result. 

Theorem 4.2. Let (hi) be a sequence of elements in a separable Banach space M 
such that X^i^i '^i^i = for a sequence w € £2, where the convergence is in M, 
implying that w ~ 0. Let (Zi) be an i.i.d. sequence of standard normal variables 
and assume that the series W = X^i^i ^i^i converges almost surely in B. Then the 
RKHS ofW as a map in B is given byW — {X^i^i '^i^i- w G £2 < 00} with squared 
norm \\J2w^hi\\jl = J2^wf■ 

Proof. The almost sure convergence of the series W = X^i^i ^i'^i ^ implies 
the almost sure convergence of the series b*W = ^ib*hi in TZ, for any b* G 

B*. Because the partial sums of the last series are zero- mean Gaussian, the series 
converges also in L2{^,U ,P). Hence for any b*,b* £ B*, 

00 00 00 

Eb*Wb*W = E ^ Zib*h, Z,b*hi = ^ b*hib*h,. 

i—l 2—1 i—1 

In particular, the sequence {b*hi) is contained in £2 for every b* € B*, with square 
norm F.{b*Wf . 

For w Cz £2 and natural numbers m < n, by the Hahn-Banach theorem and the 
Cauchy-Schwarz inequality, 

II ^ w^hi = sup II ^ u;i&*/ii|| 

m<i<n II''*II<1 m<i<n 

m<i<n II — "'"m<i<n 

As m, n — > 00 the first factor on the far right tends to zero, since w G £2- By the first 
paragraph the second factor is bounded by sup||f,.||<i E(6*VF)^ < E||W^|p. Hence 
the partial sums of the series J^i Wihi form a Cauchy sequence in B, whence the 
infinite series converges. 

Because '^as seen to converge, it follows that ^^{b*hi)hi converges 

in B, and hence b*{Y,,{h*hi)hi) = Y., b*h^b*h, = Y.b*Wb*W , for any b* G B*. 
This shows that Sb* = '^^{b*hi)hi and the RKHS is not bigger than the space, as 
claimed. 

The space would be smaller than claimed if there existed w G £2 that is not in 
the closure of the linear span of the elements {b*hi) of £2 when b* ranges over B*. 
We can take this w without loss of generality as orthogonal to the latter collection, 
i.e. Wib*hi = for every 6* G B*. This is equivalent to Wihi = 0, which has 
been excluded for any w 7^ 0. □ 

It should be noted that the sequence {hi) in the preceding lemma consists of 
arbitrary elements of the Banach space, only restricted by the linear independence 
condition that Wihi = for w G £2, implying that w ~ Q (and the convergence of 
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the random sequence Zihi). Combined with an i.i.d. standard normal sequence 
as coefficients, this sequence turns into an orthonormal basis of the RKHS. 

From the proof it can be seen that the linear independence is necessary. If it 
fails, then the RKHS is the set of linear combinations '^^Wihi with w restricted to 
the closure in £2 of the set of sequences {b*hi) when b* ranges over B* and square 
norm '^.I'wf. (Taking these linear combinations for all w ^ £2 gives the same set, 
but the £2-iiorm should be computed for a projected w.) 

Example 4.2. For Zq, . . . , Z^ i.i.d. standard normal variables consider the polyno- 
mial process 1 X^iLo Zit^ /i\ viewed as a map in (for instance) C[0, 1]. The RKHS 
of this process is equal to the set of fcth degree polynomials Pa{t) ~ X]i=o ^it^ /i\ 
with square norm H-Palln ~ X]i=o "^f' '^-^-^ ^^'^ degree polynomials P with square 
norm - Eto (0)^- 

Conversely, any Gaussian random element W in a separable Banach space can 
be expanded in a series W = Ejli ^j^j for i.i.d. standard normal variables Zi and 
any orthonormal basis {hi) of its RKHS, where the series converges in the norm of 
the Banach space. Because we can rewrite this expansion as W = Ej \\hj\\Zjhj, 

where hj = hj/\\hj\\ is a sequence of norm one, the corresponding "eigenvalues" 
are in this case the square norms H/iip. To prove this result, recall the isometry 
U:W^ L2{nM,^) defined in (3.3). 

Theorem 4.3. Let {hi) he a complete orthonormal system in the RKHS M of a 
Borel measurable, zero-mean Gaussian random element W in a separable Banach 
space B. Then Uhi, C//12, ■ • ■ is an i.i.d. sequence of standard normal variables and 
W — '^'^i{U hi)hi , where the series converges in the norm o/B, almost surely. 

Proof. It is immediate from the definitions of U and the RKHS that U:M — > 
1/2(0, U,P) is an isometry. Because U maps the subspace SM* C HI into the Gaus- 
sian process b*W, it maps the completion H of SW into the completion of the linear 
span of this process in L2(r2,Z^, P), which consists of normally distributed variables. 
Because U retains inner products, it follows that Uhi, C//12, ... is a sequence of i.i.d. 
standard normal variables. 

By the definition of U and its continuity, for any 6* e B* , 

CX) 00 00 

b*W = U{Sb*) = u(j2{Sb*,h,)Mh,'^ = ^(56*,/i,)Hf//i, ^Y.^*{h,)UK, 

i—l i—1 i—1 

where the last equality follows from the reproducing formula (2.6) and the series 
converges in L2{^,U,P). In other words, for any b* £ B*, b* {J2'i=ihi{U hi)) = 
Yl^=i{b* hi converges in L2{^M ^^) to b*W . We wish to strengthen this to 
convergence almost surely of W„: = J27=i^i(^^i) to W in B. This is an immediate 
consequence of the Levy-Ito-Nisio theorem, as given in, e.g., ([9], Theorem 2.4), 
according to which convergence in distribution of all "marginals" b*J27=i'^i to the 
marginals b*W of some Borel measurable map in a separable Banach space, for 
b* € B*, implies the almost sure convergence of the series Ei -^i- 

An alternative proof based on a martingale argument is given in ([()], Proposi- 
tion 3.6). Let Zi, Z2, . . . be an orthonormal basis of the closed linear span of the vari- 
ables b*W in L2{fl,U, P). Then it can be seen that, for every n, E{W\ Zi, . . . , Z„) = 
J2"=i^i^i ^ Banach space sense, for hi = EZiW. Convergence of the infinite series 
follows by a martingale convergence theorem for Banach space valued variables. □ 
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5. Support and concentration 

The RKHS of a zero-mean Gaussian random element in a separable Banach 
space B is essential for an understanding of the spread of its distribution. 

To begin with, the support of W, the smallest closed set Mq in B with P{W G 
Bo) = 1, is the closure of the RKHS. 

Lemma 5.1. The support of a mean-zero Gaussian random element W in a sepa- 
rable Banach space B is the closure of its RKHS m B. It is also the closure of the 
set SR* in B. 

Proof. We first show that the probability P(||VF|| < e) of an arbitrary open ball 
centered around is positive. Let V be an independent copy of W . Because we can 
cover B with countably many balls of radius e, there exists some ball B{h,e) with 
positive measure under the law of W. The difference B{h, e) — B{h, e) is contained 
in the ball of radius 2e around 0. It follows that 

P(y -We B{0,2e)) > P{V e B{h,e))P{W e B{h,e)) > 0. 

Now {V — W)/^/2 is a zero-mean Gaussian process with the same covariance func- 
tion as W, and hence has the same distribution as W. It follows that P{W E 
5(0, V2e)) > for every e > 0. 

Since the distribution of — ft- is equivalent to the distribution of W for any h 
in the RKHS, by Lemma 3.2, it follows that P{\\W - h\\ < s) > for any £ > 
and /i e H. 

This remains true for an element ft G B that can be approximated arbitrarily 
closely by elements from the RKHS. Thus the support of W contains the closure of 
the RKHS in B. 

By the Hahn-Banach theorem this closure H can be written as 

fl Nib*), 

b*el*:h*H=0 

where 6*H = means b*h = for all ft e H and N{b*) = {b e M:b*{b) = 0} 
is the kernel of b*. If 6*H = 0, then in particular b*{Sb*) = E{b*{W))^ ^ 0, and 
hence b*{W) ~ almost surely. It follows that P(^W G N{b*)^ = 1 for every b* 
in the display. By the preceding display the complement B — M is a union of the 
open sets N(b*y. Because an open set in a separable metric space is Lindelof ([(i], 
section 10) this union can be written as a union of countably many of the sets 
N{b*y. Equivalently, the intersection in the preceding display can be restricted to 
a suitable countable subset. It follows that P{W G H) = 1. 

The second assertion follows, because the RKHS-norm is stronger than the norm 
of the containing Banach space. Completing the set SR* for the RKHS-norm before 
taking the closure in B does therefore not give a bigger set. □ 

An inequality of [1] gives further insight in the concentration of the distribution 
of W. Let Hi and Bi be the unit balls of the RKHS and the space B, respectively. 
The inequality involves the (centered) small ball probability 

Theorem 5.1. (Borell's inequality.) For any £ > and M > 0, 

P{W e £Bi + MHi) > $($-i(e~'^o(^)) 4- M). 
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Here $ is the cumulative distribution function of the standard normal distribu- 
tion. For fixed e > the right side decreases as M ^ oo according to the tails of the 
standard normal distribution. This shows that the "geometry of the concentration" 
of W is given by the unit ball of the RKHS. Summing the small ball eBi to the 
multiple A/Hi can be seen as enlarging the latter set with an e-ncighbourhood. In 
general this is necessary to capture the mass of the W, because the support of W is 
the closure of the RKHS; the RKHS itself may have probability zero. For AI ^ oo 
we obtain the equality P{W S eMi + H) = 1, for any e > 0, which (again) shows 
that W is supported within the closure of H. 

Example 5.1. For a mean-zero normal vector in B = TZ'^ with covariancc matrix 
E, the RKHS is the range of the covariancc matrix equipped with the inner product 
(Eg, I]/i)h ~ g^Hh. This follows, because B* = TZ'^ and, for the clement 5 G B* 
given by ft, i-^ h?" 9: we have Sg = EWW^g — 'Sg. The inner product of the RKHS 
is (Sg, Sh)M = Eg^Wh'^W = g'^T.h. 

The unit ball Hi is the set {E/i: ft-^Eft, < 1}. For nonsingular E this set is the 
ellipsoid determined by the inverse matrix E^^, i.e., the ellipsoid determined by 
the level sets of the density. For singular E the distribution is concentrated on a 
lower-dimensional subspace, and we have a similar interpretation after projection 
on this subspace. 

Borell's inequality is often quoted as only an exponential inequality on the norm 
\\W\\, but this is in fact a consequence. The distribution of the norm |jW^|| of a 
non-zero Borcl measurable Gaussian map W does not have atoms (cf., [2]) and 
therefore has a unique median M[W). 

Corollary 5.1. For any a; > 0, 

P(||VF|| - M{W) >x) <l- $(x/cr(M/)). 

Proof. For e = M{W) we have V{W G eBi) = P(||W^|| < M{W)) = 1/2. Hence 
the choices e = M{W) and M = x/a-{W) in Borell's inequality yield the inequality 
F{W € M{W)Mi + (a;/cr(M^))Hi) > <P{x/a{W)). Because Hi C a{W)Mi by (2.5), 
the left side is smaller than P{W G {M{W) + x)Bi), which is 1 minus the left side 
of the corollary. □ 

According to Anderson's lemma (e.g., [9], p. 73, [IG], p. 72, or [17], 3.11.4) a 
ball of fixed radius receives maximum mass of a zero-mean Gaussian distribution 
if centered at the origin. The following lemma gives a lower bound on the decrease 
in mass if the ball is centered at an element of the RKHS. The lemma is implicit 
in the proof of the main result in [7], and appears explicitly as (4.16) in [8]. 

Lemma 5.2. If h G H, then for every Borel measurable set C C B with C = —C, 

V{W -heC)> e"5ll'«llHp(M/ G C). 

Proof. By symmetry W and — W are identically distributed and hence V{W -|- /i G 
C) = P(-VK + ft G -C) = P(VK - ft G C). By Lemma 3.2, 

V{W + heC) = ElciW + h)^ Ee^"'"^lt''llSip(iy). 
This is true with —ft instead of ft as well. Combining these facts yields that 

P{W-heC) = iEe^''-5ll''llHlc(M^) -f iEe^(-'')-5ll-''llHlc.(M^) 
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= e-5ll''llHEcosh(C//i)lc(VF) > e-^ll'*llHp(i^ g C), 
since cosh a; = (e^ + e^^)/2 > 1 for every x. □ 

The lemma with C equal to the ball of radius e around refers to the noncentered 
small ball probabilities F {\\W — w\\ < e), for every w in the RKHS. Up to constants 
these can be completely characterized through the corresponding centered small 
ball probabilities and approximation of the center w from the RKHS. Define 

(5.1) Me) = ^ „ inf „^ ^Jh\\l - logP{\\W\\ < e). 

For w = this agrees with the negative exponent 0o (e) of the small ball probability 
P(||VK|| < e) = e"*^"^^-* defined previously. Up to constants this quantity gives the 
exponent of the small ball probability at center w. 

Lemma 5.3. For any w in the support of W and every s > 0, 

Me)<-'^ogP{\\W-w\\ <e) <(/.„,(£/2). 

Proof. For any h G M. with \\h — w\\ < e we have \\W — w\\ < £ + \\W — h\\ and 
hence P(||Ty-w|| < 2e) >P{\\W~h\\ < e). The latter probability can be bounded 
below by exp(— i||/i||^)P(|| W^|| < e), in view of the preceding lemma. We conclude 
by optimizing over /i e H. 

The set = {h G M.:\\h — w\\ < e} is convex and closed in H, because the 
RKHS topology is stronger than the norm topology. Therefore the (convex) map 
h I— > attains a minimum on at some point hg. Because (1 — A)ft,j + Xh G B^ 
for every /i e and < A < 1, it follows that ||(1 - X)he + A/i||^ > which 
implies that 2X{h — h^, h^)^ + X'^\\h — h,.]]^ > 0. The fact that this is true for every 
< A < 1 can be seen to imply that {h, /ie)H > ll'iellH every h £ B^. 

By Theorem 4.3 the process W can be written as 14^ = Y^^i{^^i)^i^ 
given complete orthonormal system hi,h2, . . . in H, where the series converges al- 
most surely in norm. The truncated series W"^ = takes its values 
in H. If \\W — g — w\\ < e and some arbitrary g G M, then — g — w\\ < e 
for sufhciently large m, almost surely. Equivalently, M^'" — g G B^ and hence the 
preceding paragraph implies that {W"'' — g,h^)-a > II^eIIh' eventually as ni oo, 
almost surely. Here {W'^,hs)u = Y."LiiUhi){h,, h^)^ = UY,'^ihi{h„h^)u- By 
the continuity of U the right side converges in L2{i^,U,P) to Uh^ as m ^ oo, and 
hence almost surely along a subsequence. We conclude that Uh^ — {g, h^)^ > 
almost surely on the event — g — w|| < e}. In particular the choice g = —h^ 
yields that Uh^ >0 almost surely on the event + /i^ — < e}. 

By Lemma 3.2, 

P{W Gw + eMi) = P{W-h,ew-h, + eMi) 

by the preceding paragraph. The probability on the right side is smaller than P{W € 
eBi) by Anderson's lemma. □ 

6. Small ball probability and entropy 

The unit ball of the RKHS not only expresses the shape of the Gaussian mea- 
sure, but also allows a quantitative estimate of the small ball probability e"'^"'^'^-' — 
P{J\W\\ < e) through its entropy within the Banach space. 
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Let N(^e,Mi, || • ||) be the smallest number of balls of radius e > needed to 
cover the unit ball Hi of the RKHS. This is bounded by the maximal number D{e) 
of points hi in Hi with \\hi — hj\\ > e ior i ^ j. Because each ball of radius e/2 
around a point hi has probability at least e^^/^P(||M^|| < e/2) by Lemma 5.2 and 
these balls are disjoint, it follows that 1 > D{e)e~'^/^P[\\W\\ < e/2), whence D{e) 
is finite for every e > 0. This shows that the RKHS unit ball Hi is precompact in 
B. 

The following results, which were proved by [7] and [10], refine this argument, 
and show roughly that for regularly behaved entropy e i-^ logA^(e,Hi, || • ||) and 
small ball exponent e i-^ 4>o{£), and for small e, 

logiv(-^,Hi,||.||) x0o(£). 

However, the exact statement has several constants in it. 

Lemma 6.1. Let /: (0, oo) (0,oo) he regularly varying at zero. Then 

(i) log7V(e/y20^,Hi,||.||)>0o(2e). 

(ii) If Me) < fie), then log N{e/y^, Ml, \\ ■ \\) < fie). 

(iii) //log7V(e,Hi, II • II) > /(e), then 0o(e) > /(e/^^^). 

(iv) //log7V(e,Hi, II • II) < /(e), then 0o(2e) < f{e/^/M^). 

Lemma 6.2. For a > and f3 £ TZ, as e I 0, (pois) e^"(log 1/e)'' if and only if 
logAr(e,Hi,|| • II) xe-2"/(2+a)(iog^/£)2^/(2+a)^ 

7. RKHS under transformation 

If a Gaussian process is transformed into another Gaussian process under a one-to- 
one, continuous, linear map, then the RKHS is transformed in parallel. 

Lemma 7.1. Let T:M ^ M be a one-to-one, continuous, linear map from a sep- 
arable Banach space B into a Banach space B and let W be a Borel measurable, 
zero-mean Gaussian random element in B with RKHS H. Then the RKHS of the 
Gaussian random element TW in B is equal to TH and T: H — > H is an isometry 
for the RKHS-norms. 

Proof. Let T*:l* ^ B* be the adjoint of T. The RKHS H of TW is by definition 
the completion of the set of Pettis integrals 

Sb* = ^iTW)b*{TW) = T{m/b*iTW)) = TST*b*, 

for the inner product 

{Sbl,Sbl)m = EbliTW)bUTW) = EiT*blW)iT*b*W) = {ST*bl, ST*b*)M. 

It follows that the element 5^* of H is the image under T of the element ST*b* of 
H, and its norm is the same: ||5:6*||h = \\ST*b*\\M. Thus T: ST*M* C H ^ H is an 
isometry for the RKHS-norms. It extends by continuity to a linear map from the 
completion Hq of ST*M* in H to H. Because T is continuous for the norm of B, this 
extension agrees with T. Because T: ST*M* SM* is onto, T is an isometry for 
the RKHS-norms, and Hq and H are by definition the completions of ST*M* and 
S'B*, we have that T: Hq ^ H is an isometry onto H. It remains to be shown that 

Hn = H. 
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Because T is one-to-one, the range r*B* of its adjoint is weak-* dense in B* 
([12], Corollary 4.12). By Lemma 2.2 the map S^M* ^ H is continuous relative to 
the weak-* and RKHS topologies. Combined this yields that S{T*M*) is dense in 
SM* for the RKHS-norm of H and hence is dense in M. 

Taken together the preceding shows that T: H ^ H is an isometry onto H. □ 

8. RKHS relative to different norms 

A stochastic process W can often be viewed as a map into several Banach spaces. 
For instance, a process indexed by the unit interval with continuous sample paths 
is a Borel measurable map in the space C[0, 1], but also in the space ^2(0, 1]; a 
process with continuously diffcrcntiablc sample paths is a map in C[0, 1], but also 
in C^[0, 1]. The RKHS obtained from using a weaker Banach space is typically the 
same. 

Lemma 8.1. Let (B, || • ||) be a separable Banach space on B and let \\ ■ \\' be a norm 
on B with \\b\\' < \\b\\. Then the RKHS of a Borel measurable zero-mean Gaussian 
random element in (B, || • ||) is the same as the RKHS of this map viewed in the 
completion o/B under || • ||'. 

Proof. Let B' be the completion of B relative to || • ||'. The assumptions imply that 
the identity map /: (B, || • ||) (B', || • ||') is continuous, Hnear and one-to-one. The 
proposition therefore is a consequence of Lemma 7.1. □ 

Example 8.1. Let be a mean zero Gaussian process indexed by the unit interval 
[0, 1] with covariance function K{s, t) = EWsWt- 

If W has continuous sample paths, then it is a random element in C[0, 1]. The 
RKHS of W viewed as a random element in C[0, 1] is the completion of the linear 
span of the functions K{t, •) under the inner product (2.2). 

If is a measurable process and ds < 00 surely, then is a random el- 

ement in ^2(0, 1]. The dual space of £2(0, 1] consists of the maps .9 1— > / g{s)f{s) ds 
for / ranging over £2(0, 1], and Sf{t) = F,Wt J Wsfis) ds = J K{s, t)f{s) ds. There- 
fore, the RKHS of W viewed as a random element in £2(0, 1] is the completion of 
the linear span of the functions t ^ j K{s,t)f{s)ds for / ranging over L2[0, 1] 
under the inner product {Sf, Sg)^ = J J K(s, t)f(s)g{t) ds dt. 

If W has continuous sample paths, then its covariance kernel is continuous, and 
it can be shown by direct arguments that the two RKHSs agree. This also follows 
from the preceding lemma. 

9. RKHS under independent sums 

If a given Gaussian prior misses certain desirable "directions" in its RKHS, then 
these can be filled in by adding independent Gaussian components in these direc- 
tions. A closed linear subspace Bq C B of a Banach space B is complemented if 
there exists a closed linear subspace Bi with B = Bq -I- Bi and Bq H Bi = {0}. 

Lemma 9.1. Let V and W be independent Borel measurable, zero-mean, jointly 
Gaussian maps from a given probability space into a separable Banach space with 
supports Vy and B^ such that B^ H B*^ — {0} and the subspace B^ is comple- 
mented by a subspace that contains . Then the RKHS of V + W is the direct 
sum of the RKHSs of V and W and the RKHS norms satisfy WhX -f- /I'^H^v+w — 
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Proof. By the independence of V and W the Pettis integral S^+^b* = E{W + 
V)b*{V + W) can be written as S^+'^b* = S^b* + S^b*. The assumptions of 
trivial intersection nB^ = {0} and of complementation of B^ entail that there 
exists a continuous linear map 11: B ^ B^' such that Hb = b ii b £ B^ and 116 = 
if 6 e B'*'; (cf., [()], 29.2). Then b* oU e M*, nV ^ V and UW = almost 
surely, whence {b* o n) = and hence o n) = S^b*. It follows that 

S^M* C S^+^M* and by symmetry S^M* C S^+^B*. Also, for any 6^,62 S B*, 

{S^blS^b*)MV+w = {S^+^'blon,S^+^'b*2o{I^U))MV+w 

= E{bloU{V + W)) {b^o {I -U){V + W)) 
= E{blV) {b*2W) = 0. 

We conclude that S^M* ± S^M* in H^+^, so that H^+^ is the direct (orthogonal) 
sum of and H^. Furthermore \\S^+^b*\\ly+„ = E{b*{V + W)f = F,{b*V)^ + 
E{b*W)^. □ 

By the Hahn-Banach theorem the assumption of complementation is certainly 
satisfied as soon as one of the supports of V and W is finite-dimensional. 

The assumption that B^ n B^ = {0} can be interpreted as requiring "linear 
independence" rather than some form of orthogonality of the supports of V and 
W. The stochastic independence of V and W translates the linear independence 
into orthogonality in the RKHS oiV + W. 

The assumption requires trivial intersection of the supports of the variables V 
and W, rather than of sets that carry probability one. Because the RKHS is in- 
dependent of the norm (Lemma 8.1) the closure operation involved in computing 
the support may be taken for the strongest norm which is defined on the random 
elements. 

The assumption that B^ n B*"^ = {0} cannot be removed. For instance, if = 
^^jjLiZitpi and W = series expansions with independent standard 

normal variables {Zi), {Z[) on a common basis ("(Ai), then the sum process can be 
written V + W = Y,^ /^"-^iVz for ^I'l = \fJ4+Wj^ and independent standard 
normal variables. The RKHS of F -I- is then the set of series WiV'i with 
coefficients {wi) satisfying ^i{wi/ ^'D'^ < 00 (see Section 4). Thus the RKHS is 
not an orthogonal sum and, asymptotically as i 00, the eigenvalues (p'lY, which 
determine the presence of the directions in the RKHS, arc determined by the 
slowest of the two sequences fii and If ^l[ 0, then the RKHS of F -I- is 
essentially the same as the RKHS of W . 

10. Examples 

The RKHS of standard Brownian motion, viewed as a random element in C[0, 1], 
is well known to be the set 

(10.1) {/: [0, 1] ^ 7^, / e AC, /(O) = 0, y f{tf dt < cx)}, 

where / G AC is the assumption that / is absolutely continuous. The RKHS inner 
product is 

(/,.9)h= / f'{t)g'{t)dt. 
Jo 

Lemma 10.1. The RKHS of a standard Brownian motion W on [0, 1] is given by 
(10.1) with the inner product as indicated. 
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Proof. We use the definition of the RKHS in Section 2.1 and the fact that the 
covariance kernel of Brownian motion is given by sAt. The RKHS is the completion 
of the linear span of the functions 1 s At as s ranges over [0, 1], under the inner 
product determined by 



where the prime denotes differentiation relative to t, in the sense of absolute con- 
tinuity. 

The linear span of the functions 1 1-^ s At contains every function that is at 0, 
continuous, and piecewise linear on a partition = so<si---<sjv = l. Indeed to 
obtain such a function with slopes ai, . . . , un on the intervals {sq, si), . . . , {sn-i, 
Sat), first determine the coefficient of sjv A • to have a correct slope on {sn-i, sn), 
next determine the coefficient of sn-i A • to have a correct slope on {sn-2, sn-i), 
etc. The derivatives of these functions are piecewise constant, and the set of piece- 
wise constant functions is dense in L2[0, 1]. □ 

Given the RKHS of Brownian motion it is now easy to derive the RKHS of 
several processes related to it. 

• To release Brownian motion at zero, we may start it at an independent standard 
normal variable Z, giving the process t i-^ Z + Wt- The RKHS of the constant 
process 1 1—^ Z are the constant functions, which have trivial intersection with the 
RKHS of Brownian motion. A given function /: [0, 1] — > 7?, can be decomposed as 
/ = /(O) + (/ — /(O)), where the second part is in the RKHS of Brownian motion 
if it is absolutely continuous with square integrable derivative. By Lemma 9.1, 
the RKHS of Z + W is the set of all absolutely continuous functions /: [0, 1] TZ 
equipped with the inner product {f,g)m = f{0)g{0) + J f'{s)g'{s)ds. 

• To smooth Brownian motion we may consider its fc-fold integral Iq^W, where 
{Io+f){t) = Jq f{s)ds and Iq^ = I^^^Iq_^_. Taking a primitive is a continuous, 
linear, one-to-one map from C[0, 1] C[0, 1], and hence by Lemma 7.1 the RKHS 
of Iq+W is the set of functions /g+Z for / in the RKHS of Brownian motion, 

equipped with the inner product (/q_^/, /g_|_ g)H = f {s)g' {s) ds. This space 
can be described simply as the set of all functions /: [0, 1] TZ that are fc-times 
differentiable with an absolutely continuous kth derivative with square- integrable 
y(fc-i-i)^ equipped with the inner product {f,g)M = Jq /^'^"'"^^ (s)g(''+^' (s) c?s. 

• The sample paths of fc-fold integrated Brownian motion Ig^W have k vanish- 
ing derivatives at zero, which negatively affects its approximation properties to 
smooth functions. (See Example 10.1 below.) We can release the derivatives by 
adding a polynomial and considering the process t ^ Si=o -^^^V*' + {Io+W)t, 
for ZQ,...,Zk i.i.d. standard normal variables, independent of W. The sup- 
ports of the polynomial process t i-^ X]i=o-^»*V*' ^'^d Iq+W in C[0, 1] do not 
have a trivial intersection, and hence we cannot apply Lemma 9.1 in that set- 
ting. However, we may consider these processes as Borel measurable random 
elements in the space C^'^^ [0, 1] of fc-times differentiable functions, equipped 
with the norm = |l/||oo + ||/^'^^||oo- According to Lemma 8.1, this does 
not change the RKHS. The support of the process Iq+W in C^'^^p, 1] contains 
only functions with fc vanishing derivatives at 0, and hence does have triv- 
ial intersection with the support of the polynomial process t i-^ Si=o-^»^V*'j 
which is the set of fcth degree polynomials. Applied in this setting Lemma 9.1 
yields that the RKHS of the process t i-* EiLo^^^V*' + (-^o+^)t is the set 
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of functions /: [0, 1] — > 7?. that are fc-times differentiable with an absolutely 
continuous fcth derivative with square-integrable f^''~^^\ equipped with the in- 
ner product {f,g)M = Eto /*'HO)ff(*HO) + /q ds. To see the 

latter, note that any / can be uniquely written as / = Pk + (/ — Pk) for 
Pk{t) = X^iLo /^'''(O)^V*^ the fcth degree Taylor polynomial and / — a func- 
tion with k vanishing derivatives at zero. The polynomial Pk is contained in 
the RKHS of the polynomial process 1 1~-> X^iLo ^if^/il with square RKHS-norm 
SiLo ^i*' (0)^ Example 4.2, and the function f — Pk is contained in the RKHS 
of Iq^W by the preceding. 

The preceding can be extended to fractional integrals of Brownian motion. 
Rather than studying the fractional integral operator in detail, we give a direct 
derivation of the RKHSs. For a > and W a standard Brownian motion the 
Riemann-Liouville process with Hurst parameter a > is defined as 



Rf ^ f {t - s)°'~^^^ dWs, t>0. 

^0 



The process i?" is a centered Gaussian process with continuous sample paths. It 
can be viewed as a multiple of the {a + l/2)-fractional integral of the "derivative 
dW of Brownian motion" . For a > and a (deterministic) measurable function / 
on [0, 1] the (left-sided) Riemann-Liouville fractional integral of f of order a (if it 
exists) is defined as (cf. [13]) 

/o\/(t) = f^ f\t-sr-'fis)ds. 



r(a) Jo 

For a SL natural number, the function lQ_^f is just the a- fold iterated integral of /, 

and for a > 1/2 the Rieman-Liouville process is equal to r(a + l/2)Io+^^^W for 
^0+ the fractional integral. 



Lemma 10.2. The RKHS of the Riemann-Liouville process with parameter a > 



viewed as a random element in C[0, 1] is H = I^^^^'\L2[0, 1]) and the RKHS -norm 



is given by 



,ja+l/2ru _ Wfh 
-'0+ JW" 



r(a + l/2) 

Proof. We use the characterization of the RKHS as the completion of the functions 
(2.1) under the inner product (2.2). With fs the function defined by fs{u) ~ {s ~ 
we have, for all s,t>0 and (•, •)2 the inner product of L2[0, 1], 

\t - u)l-'/\s - u)l-'/' du = {ft, /,)2 = r(a + \/2)I^+"^fs{t). 

Hence every simple element of H of the form (2.1) is given by Iq_^^^^ f for some 
/ e ^2(0, 1]. Moreover, the inner product (2.2) of two such elements I^^^^^f and 
I0+ 9 IS given by 

^^^■^> V°+ ■^'^"+ Vh" r2(a + l/2)- 

It follows that the RKHS H is a subspacc of the Hilbcrt space obtained by endowing 
Iq^^^^{L2[0,1]) with the inner product (10.2). To prove the converse inclusion. 
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suppose that g S Iq^^^^{L2[0, 1]) is orthogonal to H. Then g = Iq^^^^ f for some 

/ G L2[0, 1] and g is, in particular, orthogonal to every element Ig^^^^ft of H. 
Hence, for every t £ [0, 1], 

r. _ / 7-a+l/2 „ ja+1/2 „ \ _ if, ft) 2 _ ^+ ^ /(O 

U-^Jo+ /.^o+ -^Vh" r2(a + l/2) " r(a+l/2)- 

The injectivity of the operator /q_^"^^^: ^2(0, 1] — > ^2(0, 1] (see [13], Theorem 13.1) 

then imphes that / = 0, whence g = 0. We conclude that H = /^^^^^(L2[0, 1]), and 
the inner product on H is given by (10.2). □ 

Example 10.1. The process 1 1-^ Z + Wg ds, for Z a standard normal variable 
and W an independent Brownian motion, has sample paths of regularity 3/2 and 
can take any value at 0, but the derivative at is 0. We shall show that the latter 
makes the process inappropriate as a prior model for 3/2-smooth functions. 

By similar arguments as before the RKHS IH of the process can be seen to be 
the set of all functions h: [0, 1] TZ with absolutely continuous derivative such that 
Jq h"{sy ds < 00 and h'{0) — 0, with square norm 

I|/^IIh = I|/^"II2 + M0)^ 

We shall show that for the identity function id wc have 

inf{||/i||^: hem,\\h- id\\oo <£}>-■ 

e 

This may be contrasted with the approximation by the RKHS of the process t ^—^ 
Zq + Zit + /p Ws ds, which is of order (l/e)^/^ for every function in C^/2[0, 1] (see 
[15]). 

To prove the claim note that — id\\oo < £ implies that h{3e) — h{0) > e. 
Therefore the quantity in the display is bounded below by 

inf|^ h'\sf ds: h{3£) - h{0) > e, h'{0) = o}. 

For a given h as in the display we can define g by 

h{3ey) - hjO) 

9(y) = • 

Then g'{y) = 3h'{3£y), g"{y) = 9/i"(3ey)e, and 

g(0) = 0, 5'(0) - 0, .g(l) > 1. 

r /^"(^)^ - 9"is/i^s)r^<^s . I' g"iur±- du. 
Thus the preceding display is bigger than 

(2^) '""^{Jo "^"^ ^^^^ ^' ^^^^ = '^'^^^ = 

The infimum is nonzero, because g" = implies that 5 is a linear function, hence 
identically because 17(0) = ^'(0) ~ 0, contradicting g(\) > 1. 
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